Method and system for adaptive prefetching

ABSTRACT

A cache server prefetches one or more web pages from an origin server prior to those web pages being requested by a user. The cache server determines which web pages to prefetch based on a graph associated with a prefetch module associated with the cache server. The graph represents all or a portion of the web pages at the origin server using one or more nodes and one or more links connecting the nodes. Each link has an associated transaction weight and user weight. The transaction weight represents the importance of the link and associated web page to the origin server and may be used to control the prefetching of web pages by the cache server. The user weight may be used to change a priority associated with a request for a web page. The user weight and transaction weight may change based on criteria associated with the origin server.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. application Ser. No.13/608,178 filed Sep. 10, 2012 and now U.S. Pat. No. ______, which is acontinuation of U.S. application Ser. No. 13/079,557 filed Apr. 4, 2011and now U.S. Pat. No. 8,275,778, which is a continuation of U.S.application Ser. No. 11/534,971 filed Sep. 25, 2006 and now U.S. Pat.No. 7,921,117, which is a continuation of U.S. application Ser. No.09/731,365 filed Dec. 6, 2000 and now U.S. Pat. No. 7,113,935, all ofwhich are hereby incorporated by reference herein.

TECHNICAL FIELD OF THE INVENTION

This invention relates in general to data processing systems and, moreparticularly, to a method and apparatus for adaptive prefetching.

BACKGROUND OF THE INVENTION

As computers have grown increasingly important in today's society, theimportance of public and private networks and, especially, the Internethas also increased. As increasing numbers of users access the

Internet, the need for efficient use of bandwidth has also increased.The increasing numbers of requests handled by the Internet areincreasing the delay experienced by a user between generating a requestand receiving a response to the request because of bandwidthlimitations.

One traditional solution to decreasing overall bandwidth usage anddecreasing the delay experienced by the user has involved cachingpreviously requested content at the user's computer for fasterretrieval. A related traditional solution has involved cachingpreviously requested content for multiple users at a single cacheserver. Another traditional solution has involved increasing thebandwidth of the network connection between the Internet, the user andthe web servers handling the requests. However, traditional solutionshave often failed as the number of requests continues to increase andoverload single cache servers and because of the expense associated withmaintaining large numbers of high speed connections to the Internet. Inaddition, traditional solutions have often failed to provide for thedistinguishing the relative importance of web pages.

SUMMARY OF THE INVENTION

Other embodiments, technical advantages, features, and aspects will beapparent to one of ordinary skill in the art from the following figures,descriptions, and claims. One aspect of the present invention involves amethod for data processing comprising receiving a web page request. Theweb page request requests a first web page. The first web page isassociated with an origin server. The method further comprisesassociating the first web page with a first node in a prefetch graph andassociating a respective second node in the prefetch graph with each ofa plurality of second web pages associated with the first web page. Themethod further comprises generating at least one link in the prefetchgraph between the first node and each of the second nodes. Each link hasa respective associated user weight and a respective associatedtransaction weight. The method further comprises selecting at least oneof the second web pages to retrieve based on the graph, and storing theselected second web pages at a cache server.

Another aspect of the present invention involves a method for dataprocessing comprising receiving a web page request for a first web page.The web page request has an associated origination web page. The methodfurther comprises associating an origination node in a prefetch graphwith the origination web page and associating a first node in theprefetch graph with the first web page. The first web page is associatedwith the origination web page. The method further comprises updating afirst link between the origination node and the first node. The firstlink has an associated first user weight and an associated firsttransaction weight. The method further comprises associating a secondnode in the prefetch graph with each of a plurality of second web pagesassociated with the first web page and generating a respective secondlink in the prefetch graph between the first node and each of the secondnodes. Each second link has an associated second user weight and anassociated second transaction weight. The method further comprisesselecting a second web page to retrieve based on the transaction weight,and storing the second web page at a cache server.

A further aspect of the present invention involves a system for dataprocessing comprising a memory coupled to a processor and an applicationstored in the memory. The application is operable to receive a web pagerequest for a first web page. The web page request has an associatedorigination web page. The application is further operable to associatean origination node in a prefetch graph with the origination web pageand associate a first node in the prefetch graph with the first webpage. The first web page is associated with the origination web page.The application is further operable to associate a first link in theprefetch graph with a hypertext link from the origination web page tothe first web page and associate a transaction weight with the firstlink based on prefetch criteria associated with an origin serverassociated with the prefetch graph. The application is further operableto associate a user weight with the first link based on the prefetchcriteria, retrieve the first web page, and store the first web page.

The present invention provides various technical advantages. Variousembodiments of the invention may have none, some, or all of theseadvantages. One such technical advantage is the capability forprefetching web pages from an origin server to a cache server andstoring the prefetched web pages at the cache server. In addition, theweb pages may be prefetched and stored at the user's computer.Prefetching of web pages can provide a user increased performance byproviding the requested web page from the cache server and/or the user'scomputer instead of the origin server. Another technical advantage isthe capability of the cache server to maintain a graph of web pages andhypertext links associated with the origin server. A transaction weightand a user weight may be associated with links between the web pages onthe origin server. The transaction weight may be used to control theprefetching of the web pages by the cache server. The user weight may beused to increase or decrease the priority associated with a request fora web page from the origin server. Yet another technical advantage isthe capability to update the user and transaction weights depending oncriteria specified by an administrator associated with the originserver. For example, the transaction weight and/or user weightassociated with a hypertext link may be increased or decreased inresponse to the popularity of the web page or the relative importance ofthe link.

BRIEF DESCRIPTION OF THE DRAWINGS

A better understanding of the present invention can be realized from thedetailed description that follows, taken in conjunction with theaccompanying drawings, in which:

FIG. 1 is a block diagram illustrating a cache system with adaptiveprefetch capabilities;

FIG. 2 is a graph illustrating an exemplary embodiment of a graph usedin association with the system of FIG. 1; and

FIG. 3 is a flow chart illustrating a method for providing prefetchingof web pages by a cache server using the system of FIG. 1.

DETAILED DESCRIPTION OF THE INVENTION

FIG. 1 is a block diagram illustrating a cache system 10 with adaptiveprefetch capabilities. System 10 comprises a client 12, a user 13, anetwork 14, an origin server 16, and a cache server 18.

Client 12 comprises any suitable general purpose or specialized computeroperable to support execution of a web browser 20. Client 12 is coupledto network 14. User 13 comprises a human user or automated processassociated with client 12 and web browser 20.

Browser 20 is executed on client 12 and comprises any suitable HypertextTransport Protocol (HTTP) client. In the disclosed embodiment, browser20 comprises a web browser such as Internet Explorer® by Microsoft Corp.of Redmond, Wash., or Netscape Communicator by Netscape CommunicationsCorp. of Mountain View, Calif. Browser 20 transmits and receives dataover network 14. Browser 20 is operable to generate one or more requests22.

Request 22 comprises a request for an item of content from origin server16. More specifically, request 22 may use a uniform resource locator(URL). The URL identifies a particular origin server 16 by the Internetdomain name associated with the origin server 16 and a web page 30located at the origin server 16. The domain name and web page 30identify the particular web page 30 request 22 is requesting. As usedherein, an item of content (“content item”) indicates a particularelement of content, such as a particular web page, while content refersgenerally to data to be retrieved. The requested content item mayfurther comprise multiple items of content, for example, a web page withmultiple graphical elements, but request 22 indicates a single contentitem while the remaining items of content associated with the requestedcontent item are retrieved as a function of the requested content item.Content may comprise static or dynamic audio data, video data, textdata, multimedia data, hypertext markup language (HTML) data, binarydata and any other suitable types of data capable of being used byclient 12 or displayed by web browser 20. In the disclosed embodiment,requests 22 are HTTP requests for HTML data, such as a web page.

Network 14 comprises any suitable data network system for communicatingdata between computer systems. For example, network 14 may comprise theInternet, an asynchronous transfer mode (ATM) network, an Ethernetnetwork, a Transmission Control Protocol/Internet Protocol (TCP/IP)network, an intranet or any other suitable computer networkingtechnologies in any combination. For purposes of teaching the presentinvention, an exemplary embodiment will be described where network 14comprises the publicly accessible interconnection of computer networkscommonly known as the Internet.

Origin server 16 comprises any suitable hardware and/or softwareexecuting on a computer for receiving and responding to requests 22.Origin server 16 may comprise a single computer executing software ormay comprise a plurality of computers each executing software. In thedisclosed embodiment, origin server 16 comprises an HTTP server whichmay also be known as a web server. Origin server 16 may additionallysupport other protocols such as the file transfer protocol (FTP). Originserver 16 may retrieve information from local data sources and/or remotedata sources in response to requests 22. Origin server 16 may beoperable to retrieve static content, such as pre-written text files,images and web pages, from the data sources in response to requests 22.Origin server 16 may also be operable to generate new, dynamic content,for example, by dynamically creating web pages based on content storedat the data sources, in response to requests 22. For example, originserver 16 may generate a new web page using a common gateway interface(CGI) script, generate a new web page from the result of a structuredquery language (SQL) request and perform other suitable contentgeneration functions in response to requests 22. Origin server 16 mayalso be operable to generate executable software, such as applicationsand applets, in response to requests for data. For example, originserver 16 may generate a Java applet in response to an appropriaterequest 22.

Origin server 16 also comprises one or more web pages 30. Web pages 30each comprise a content item identified by a URL and having one or moreitems of content associated with it. For example, a particular web page30 may have graphics, text, animations, applets, and other types of dataand multimedia information associated with it. Origin server 16 alsocomprises a requested web page 32. Requested web page 32 comprises aparticular one of the web pages 30 requested by request 22.

Cache server 18 caches content for transmission to web browsers 20 inresponse to requests 22. Cache server 18 responds to requests 22 frombrowser 20 by intercepting request 22 and providing the requested webpage or other content item to browser 20 using network 14. By respondingto requests 22 at cache server 18, the processing and network load atorigin server 16 is decreased and user 13 receives more efficient andfaster service. Cache servers 18 cache web pages 30 from origin server16. Cache servers 18 provide current, cached content items originallyavailable from origin server 16 to browser 20 in response to requests22. In the disclosed embodiment, a single cache server 18 is shown ascommunicating with a single origin server 16, however, multiple cacheservers 18 may be used and be operable to communicate with and provideservice to a plurality of origin servers 16.

Cache server 18 further comprises a prefetch module 40. Prefetch module40 comprises a suitable combination of software and/or hardware operableto retrieve web pages 30 from origin server 16. Prefetch module 40operates to generate a logical graph 42 associated with an origin server16 and use the graph 42 to determine which web pages 30 to prefetch fromorigin server 16 to cache server 18. More specifically, graph 42 is alogical construct that allows examination and relative weighting ofrelationships between web pages 30 on a particular origin server 16.Graph 42 is described in more detail in association with FIG. 2. Graph42 comprises a directed graph having one or more ways associated withedges connecting nodes in the graph 42. Each node comprises a web pageand each edge comprises a link from one web page 30 to another web page30.

Cache server 18 also comprises priority criteria 44. Priority criteria44 are used by cache server 18 to associate a priority 46 with eachrequest 22. Priority criteria 44 may be used by cache server 18 todetermine priority 46 associated with request 22. For example, prioritycriteria 44 may associate priority 46 with request 22 based on theparticular requested web page 32. For example, if requested web page 32comprises a “buy” web page 30 at origin server 16, request 22 may begiven a higher priority 46 than a request 22 for a “contact information”web page. By associating priorities with request 22, cache server 18 andorigin server 16 may provide more efficient service to importantrequests while supplying relatively slower service to less importantrequests 22. Priority 46 comprises an indication of the importance of aparticular request 22. Priority 46 may comprise an integer, a realnumber, an alphanumeric value, or any other suitable value operable toindicate a relative priority. Priority 46 may also indicate a relativeincrease or decrease to a priority already associated with request 22.

Cache server 18 may also utilize a prefetch threshold 48. Prefetchthreshold 48 comprises a data construct operable to indicate which webpages 30 may be retrieved by prefetch module 40. More specifically, ascache server 18 becomes increasingly busy, cache server may use prefetchthreshold 48 to impose a cut-off point when determining which web pages30 to prefetch. Prefetch threshold 48 is described in more detail inassociation with FIG. 2.

Cache server 18 may also comprise site criteria 50. Site criteria 50comprise configuration information associated with origin server 16. Forexample, site criteria 50 may indicate how graph 42 is to be generatedfor origin server 16 as well as other information associated with graph42 and origin server 16.

In operation, user 13 at client 12 generates request using browser 20for content from origin server 16. More specifically, request 22requests requested web page from origin server 16. Cache server 18intercepts request 22 from web browser 20 before request 22 reachesorigin server 16. For example, cache server 18 may intercept request 20by having a domain name service (DNS) server associated with originserver 16 direct request 22 from the Internet domain associated withorigin server 16 to cache server 18. Stated another way, request 22addressed to the domain associated with origin server 16 may be routedto cache server 18 through the operation of a DNS server.

After receiving request 22, cache server 18 determines whether requestedweb page 32 is presently available at cache server 18. As used herein, aweb page is “available” at cache server 18 when an unexpired copy of webpage 30 presently exists at cache server 18. An unexpired web page 30 atcache server 18 comprises a copy of a web page 30 available from originserver 16 that is the same as the web page 30 originally available fromorigin server 16. Stated another way, an unexpired web page at cacheserver 18 comprises a copy of a web page 30 on origin server 16 whichhas not changed at origin server 16 since the copy was made at cacheserver 18. A number of conventional suitable methods may be used tosynchronize and expire web pages 30 at cache server 18.

If requested web page 32 is available at cache server 18, then cacheserver 18 communicates requested web page 32 to client 12. If requestedweb page 32 is not available at cache server 18, then cache server 18retrieves requested web page 32 from origin server 16 and communicatesrequested web page 32 to client 12. Cache server 18 also determineswhether requested web page 32 retrieved from origin server 16 iscacheable, and, if requested web page 32 is cacheable, caches requestedweb page 32 at cache server 18.

After communicating requested web page 32 to client 12, cache server 18uses prefetch module 40 to determine which web pages 30, if any, toprefetch from origin server 16. By prefetching web pages 30 from originserver 16, cache server 18 is attempting to provide increasedresponsiveness to user 13. Prefetching web pages 30 comprises retrievingweb pages 30 from origin server 16 before the web pages 30 are requestedby user 13. Instead of reacting to requests 22 and caching onlyrequested web pages 32, prefetch module 40 uses graph 42 to attempt topredict which web pages 30 user 13 is likely to select next. Prefetchmodule 40 can then retrieve web pages 30 from origin server 16 beforeuser 13 requests the web page 30. User 13 then experiences decreaseddelay when retrieving web pages 30 because the web pages have alreadybeen cached at cache server 18. When origin server 16 is a popular siteand multiple cache servers 18 are used, a significant performanceincrease may be experienced by user 13 as the processing and networkload at origin server 16 is decreased and spread among cache servers 18.For example, a prefetch of a “check out” page or a “further information”page for an item may increase the performance experienced by the userwhen the user requites these prefetched pages. The particular web pagesprefetched may be selected as they are relatively more important toorigin server 16 than other web pages because users may tend to be morelikely to make a purchase when the prefetched web pages are requested bythe user.

Cache server 18 then examines graph 42 associated with origin server 16to which request 22 is directed. Graph 42 may modify priority 46associated with request 22. For example, priority 46 of request 22 maybe increased or decreased. By changing priority 46 associated withrequest 22, prefetch module 40 may use information available from graph42 to provide increased service to users 13 requesting high priority webpages 30 and decreased service to users 13 requesting low priority webpages 30. In general, graph 42 allows priority 46 to be changed based onthe particular requested web page 32 user 13 is requesting and web page30 from which user 13 selected web page 32.

In addition, prefetch module 40 may pre-load web pages linked torequested web page 32 based on graph 42, priority 46 and threshold 48.More specifically, prefetch module 40 determines whether related webpages are already cached at cache server 18 and may then retrieve one ormore uncached related web pages 30.

FIG. 2 is a graph illustrating an exemplary embodiment of graph 42.Graph 42 comprises a plurality of nodes 130A, 130B, 130C, 130D, 130E,130F, 130G, 130H, and 1301, and a plurality of links 100A, 100B, 100C,100D, 100E, 100F, 100G, 100H, 100T, and 100J. For increased clarity,links may be referred to generically as “link 100” while links 100A-Jrepresent the particular links shown in FIG. 2. Similarly, nodes may bereferred to generically as “node 130” while nodes 130A-I represent theparticular nodes in FIG. 2. Each node 130A-I has a respective associatedweb page 30A, 30B, 30C, 30D, 30E, 30F, 30G, 30H and 301. For example,node 130A has an associated web page 30A representing an index page.Each link 100 is respectively associated with a hypertext link betweenweb pages 30. For example, link 100A between node 130A and node 130Bindicates a link from web page 30A node 30A to web page 30B.

Each link 100 also comprises an associated transaction weight 102 and anassociated user weight 104. Transaction weight 102 comprises anindication of the importance of the link to an administrator associatedwith origin server 16. More specifically, transaction weight 102indicates the relative importance of hypertext links associated withlinks 100 in graph 42. Transaction weight 102 may be used by prefetchmodule 40 to determine which pages 30 to prefetch and in what order toprefetch web pages 30. Transaction weight 102 may comprise a numeric orother indication of the weight. In one embodiment, transaction weight102 comprises a real number.

User weight 104 comprises an indication of how to modify the priority ofrequest 22 based on the link 100 associated with request 22. Morespecifically, the priority associated with user 13 may be increased ordecreased based on user weight 104. The increase or decrease may bedetermined by the administrator associated with origin server 16 basedon the importance of the link 100. For example, link 100 between node30A and node 30B indicates a user weight of 1.0 which may be used toindicate no change in the user's priority. For another example, link 100between index page 30A and contact page 30C indicates a user weight 104of 0.1 which may indicate a decrease in the priority associated withuser 13 because the administrator associated with origin server 16 doesnot consider contact page 30C to be a high priority page 30. Criteria 50may be used to indicate weights 102 and 104 for a particular originserver 16.

User weight 104 may comprise any suitable indication of the priorityassociated with link 100. In the exemplary embodiment of FIG. 2, userweight 104 is a real number indicating a magnitude of change in priority46 by link 100.

Graph 42 may be used to represent the organization of web pages 30 at anorigin server 16. Using graph 42, module 40 can determine how importantparticular links 100 and web pages 30 are to origin server 16. Morespecifically, transaction weight 102 may be used to determine theimportance of web pages 30 to origin server 16. This allows prefetchmodule 40 to prefetch important web pages 30 so that users 13 experienceincreased performance with respect to particular portions of originserver 16. For example, if origin server 16 is paying for cachingservices from cache server 18 based on the amount of data cached bycache server 18, then transaction weight 102 may be used by originserver 16 to restrict prefetching of web pages 30 to important web pages30 associated with origin server 16, such as a product purchaseconfirmation page.

User weight 104 may also be used to represent the importance of a webpage 30 or link 100. User weight 140 indicates the priority level forservicing request 22. For example, priority 46 associated with request22 may be low for a particular user 13 because that user 13 browsesoften, but rarely buys, and user weight 140 may be used to raisepriority 46 when user 13 selects a “buy product” link.

When user 13 selects a link 100, user weight 104 may modify priority 46associated with request 22. More specifically, priority 46 associatedwith request 22 may be adjusted up or down based on user weight 104which allows link 100 to specifically prioritize requests 22. Forexample, user weight 104 of 1.0 associated with link 100A may indicateno change in priority 46 while user weight 104 of 0.1 on link 100B maydecrease priority 46 because contact page 30C is considered to be lessimportant to an administrator associated with origin server 16 than auser wishing to view catalogue page 30B.

For example, request 22 may request index page 30A from origin server16. After index page 30A has been returned to client 12, prefetch module40 may then examine graph 42. If no graph 42 exists for origin server 16associated with index page 30A, then prefetch module 40 may generate anew graph 42 for origin server 16. Generating a new graph 42 may be doneincrementally or all-at-once. As origin server 16 may support a largenumber of web pages 30, the all-at-once approach may impose asignificant burden on the processing capabilities and network bandwidthat origin server 16. For example, cache server 18 may have to retrieve asubstantial portion of the web pages 30 at origin server in order todetermine the relationships between the web pages 30 at origin server 16and generate graph 42.

Origin server 16 may also choose to build graph 42 incrementally. Forexample, an incremental build of graph 42 may comprise only adding webpages 30 associated with origin server 16 to graph 42 that are linked toa retrieved web page 30. Referring to FIG. 2, when web page 30E isretrieved for the first time, the incremental build of graph 42 wouldthen add web pages 301 and 30F to graph 42.

In addition, historical information may be used to build graph 42 inassociation with the incremental or fixed-interval methods of buildinggraph 42. For example, logs created by origin server 16 may indicatewhich URLs and/or web pages 30 have been retrieved. Also, the logs mayindicate when the web pages 30 have been retrieved which allows theorder in which web pages 30 are retrieved to be determined.

In the disclosed embodiment, origin servers 16 are differentiated basedon the domain name associated with the origin server 16 and a distinctgraph 42 may be associated with each domain. Alternatively, prefetchmodule 40 may be configured to generate graphs 42 at any desired levelof granularity, such as at the sub-domain level or the global top leveldomain (gTLD) level.

Prefetch module 40 then determines whether to prefetch catalogue page30B and contact page 30C linked to index page 30A by links 100A and 100Brespectively. Prefetch module 40 examines transaction weight 102associated with links 100A and 100B. Any other suitable techniques maybe used to determine which pages 30 to prefetch. Prefetch module 40 maythen determine, based on transaction weight 102, whether to retrievecatalogue page 30B, contact page 30C or neither. More specifically,prefetch module 40 compares transaction weights 102 respectfullyassociated with links 100A and 100B. Prefetch module then determineswhether transaction weight 102 for links 100A and 100B exceeds prefetchthreshold 48. In FIG. 2, transaction weights 102 are shown as realnumbers, however, integer values or other values may be used. Prefetchmodule 40 may also use transaction weights 102 as a modifier to anothervalue. For example, cache server 18 and prefetch module may maintainprefetch threshold 48 for individual origin servers 16.

Prefetch threshold 48 may be based on the processing load, currentbandwidth available or other relevant metrics currently beingexperienced by cache server 18.

For example, when cache server 18 is experiencing heavy traffic,prefetch threshold 48 may increase so that fewer web pages 30 are beingprefetched. Prefetch threshold 48 may also comprise multiple values,each individually associated with particular origin servers 16. Forexample, origin server 16 may want only high transaction weight items tobe prefetched. For another example, prefetch threshold 48 for aparticular origin server 16 may change based on the load currently beingexperienced by origin server 16. By decreasing the number of web pages30 be prefetched, the processing load at cache server 18 or originserver 16 may be decreased. For example, prefetch threshold 48 may be1.0, indicating that link 100A has a transaction weight 102 high enoughfor retrieval of catalogue page 30B, while link 100B does not have atransaction weight 102 high enough for prefetching of contact page 30C.Depending on the configuration of prefetch module 40, other web pages30, such as 30D-I, may also be prefetched by prefetch module 40.

Weights 102 and 104 may also change over time. When graph 42 isinitially generated for an origin server 16, default or initial weights102 and 104 may be assigned to links 100. As users 13 retrieve web pages30 from origin server 16, criteria 50 associated with origin server 16may indicate how to update weights 102 and/or 104 based on the pages 30retrieved by users 13. For example, criteria 50 may indicate thatweights 102 and/or 104 be increased when a particular page is retrieveda certain number of times. For another example, criteria 50 may indicatethat a link 100 which has not been selected for a certain period of timehas the associated transaction weight 102 decreased. Also, criteria 50may place increased importance on web pages 30 that result in aparticular outcome. For example, on an electronic commerce web site, aweb page 30 which results in a final “buy” transaction may be givenincreased weight because an item has been purchased previously from thatweb page 30. In general, a variety of suitable criteria 50 may be usedto determine how to increase and/or decrease weights 102 and/or 104 forparticular origin servers 16.

FIG. 3 is a flow chart illustrating a method for providing prefetchingof web pages 30 by a cache server 18. The method begins at step 200where user 13 generates a request 22 for requested web page 32 using webbrowser 20. Next, at step 202, request 22 is communicated over network14 and intercepted by cache server 18. Then, at decisional step 204,cache server 18 determines whether requested web page 32 is cached. Ifrequested web page 32 is not cached then the NO branch of decisionalstep 204 leads to step 206 where requested web page 32 is retrieved fromorigin server 16. Proceeding to decisional step 208, cache server 18determines whether requested web page 32 is cacheable. If requested webpage 32 is cacheable then the YES branch of decisional step 208 leads tostep 210. At step 210, the requested web page 32 is cached at cacheserver 18. If cache server 18 determines at step 208 that requested webpage 32 is not cacheable, then the NO branch of step 208 leads to step212.

Returning to step 204, if requested web page 32 was already cached atcache server 18, then the YES branch of decisional step 204 leads tostep 212. At step 212, the requested web page 32 is communicated overnetwork 14 to client 12 for display by web browser 22 to user 13.

Next, at decisional step 220, prefetch module 40 determines whetherorigin server 16 is being graphed incrementally or on fixed intervals.More specifically, at decisional step 220, prefetch module 40 determineshow graph 42 is to be updated for origin server 16. Incrementallyupdating graph 42 may comprise adding links 100 and nodes 130 as users13 retrieve web pages 30 from the origin server 16 associated with graph42. If updating of graph 42 is to be performed incrementally, then theYES branch of decisional step 220 leads to decisional step 222.

At decisional step 222, prefetch module 40 determines whether a graph 42currently exists for origin server 16. If no graph 42 is currentlyassociated with origin server 16 then NO branch of decisional step 222leads to step 224. At step 224, a portion of graph 42 is generated. Morespecifically, a first node 130 is generated for graph 42 and associatedwith requested web page 32. Referring to FIG. 2, if the requested webpage 32 was index page 30A, index page 30A would become the first node130A of graph 42. In general, criteria 50 associated with origin server16 may indicate where to begin building graph 42, retrieved web page 32may be used as the starting point or any other suitable startinglocation may be used.

Returning to step 222, if graph 42 does exist for origin server 16 thenthe YES branch of decisional step 222 leads to step 226. At step 226,requested web page 32 is added to graph 42 associated with origin server16. If requested web page 32 already exists in graph 42, then a new nodemay not be added. Links 100 associated with the newly added web page 32are also added to graph 42. If requested web page 32 was already ingraph 42, then requested web page 32 may be examined to determine if thelinks 100 associated with the retrieved web page 32 need to be updated.Referring to FIG. 2, if web page 30R has just been added to graph 42,then links 100C and 100D are added at step 226. Next, at step 228,weights 102 and 104 associated with links 100 are updated. Morespecifically, links 100 associated with the retrieved web page 30 may beupdated in response to a retrieval of the web page 30. For example,links 100 to the retrieved web page 30 may have their transaction weight102 increased because the web page 30 to which link 100 refers hasbecome more popular. Referring to the example in FIG. 2, if web page 30Dis retrieved, link 100C may have transaction weight 102 and/or userweight 104 increased or decreased in response to the retrieval of webpage 30D. An administrator associated with origin server 16 and/or anadministrator associated with cache server 18 may determine the criteriaby which weights 102 and 104 are updated. For example, the administratormay configure prefetch module 40 to increase weights 102 and/or 104 by0.1 after a particular web page 30 has been downloaded 100 times. Morespecifically, nodes 130 associated with web pages 30 have which have notyet been added to graph 42 may be added in step 242. Also, changes tothe organization and number of web pages 30 at origin server 16 may behandled at step 226. For example, new web pages 30 may be added, old webpages 30 may be deleted, and links 100 between web pages 30 may change.

For example, user 13 retrieves an origination web page and module 40generates an origination node in graph and associates the originationnode with the origination web page. Hypertext links associated with theorigination web page are added as links 100 from the origination node.One or more further web pages associated with the hypertext links maythen be added to graph 42 as nodes. More specifically, links 100 areadded from the origination node to the nodes associated with the furtherweb pages linked to from the origination node. Weights 102 and 104 maythen be associated with links 100 based on criteria 50.

Proceeding to step 230, prefetch module 40 determines the next web page30 to prefetch. Then, at step 232, the selected page is prefetched. Morespecifically, prefetch module 40 may maintain prefetch threshold 48 andretrieve web pages 30 linked to the retrieved web page 32 and having atransaction weight 102 greater than prefetch threshold 48. Next, atdecisional step 234, prefetch module 40 determines whether more links100 remain to be prefetched. If more web pages 30 exist to be prefetchedthen the YES branch of decisional step 234 returns to step 230. if nomore web pages 30 currently exist to be prefetched then the NO branch ofdecisional step 234 is followed and the method ends. Prefetch module 40may determine whether further web pages 30 remain to be prefetched bydetermining whether any links 100 are associated with the current webpage 30 which have not yet been considered for prefetching. In general,any suitable technique may be used to determine if more web pages 30exist to be prefetched.

Returning to step 220, if graph 42 is not to be updated in real timethen the NO branch of decisional step 220 leads to step 240. At step240, links 100 associated with retrieved web page 32 are followed untilorigin server 16 has been graphed. For example, when origin server 16contracts for service from cache server 18, prefetch module 40 may buildgraph 42 by starting at an index page 30A associated with origin server16 and recursively traversing all links 100 associated with index page38 to build graph 42. Any suitable technique may be used for traversinglinks 100 and handling loops and other items. Then, at step 242, graph42 is updated based on retrieved web page 32. More specifically, nodes130 associated with web pages 30 have which have not yet been added tograph 42 may be added in step 242. Also, links 100 between web pages 30may be added at step 242 to graph 42. Step 242 may be performed in orderto handle changes to the organization and number of web pages 30 atorigin server 16. For example, new web pages 30 may be added, old webpages 30 may be deleted, and links 100 between web pages 30 may change.Depending on criteria 50 associated with origin server 16, the update tograph 42 may begin at retrieved web page 32 and continue to web pages 30linked to web page 32, may begin at a predetermined web page 30, such asweb page 30A in FIG. 2, or at some other suitable web page 30 associatedwith origin server 16. Proceeding to step 244, links 100 without weights102 and/or 104 may be assigned a default weight as indicated in criteria50 as configured by an administrator associated with origin server 16and/or cache server 18. As links 100 and web pages 30 are added orremoved from graph 42, default weights 102 and 104 may be associatedwith newly added links 100 for use with prefetch module 40.

System 10 provides the capability for prefetching web pages from anorigin server so that a user realizes increased performance. A cacheserver stores the prefetched web pages so that the user may receiverequested web pages more quickly. For example, the cache server may belocated “closer” to the user on the Internet so as to add less networkrelated delay in responding to the user's request for a web page. Byproactively retrieving web pages from the origin server, web pages maybe cached before a user has ever requested the web page. In addition, byassociating a transaction weight with links between web pages on theorigin server, the importance of particular web pages and the order ofthe prefetching of the web pages may be controlled. Also, by adjusting aprefetch threshold associated with an origin server, some web pages maybe prefetched while others are not based on the transaction weight. Forexample, an origin server being served by multiple cache servers may notwant all of the web pages associated with the origin server to beprefetched and the origin server may set its prefetch threshold toexclude the prefetching of web pages with a low transaction weight.

A request for a web page may have a priority associated with therequest, for example, to indicate the importance of the request or theuser who generated the request. A user weight may also be associatedwith links between web pages at the origin server to change and/or varythe priority associated with a request. For example, a request with alow priority may be given a higher priority because of the particularweb page the request is requesting.

In addition, the user and transaction weights may change depending oncriteria specified by an administrator associated with the originserver. For example, the transaction weight and/or user weightassociated with a hypertext link may be increased in response to aparticular web page being retrieved. For another example, thetransaction weight and/or user weight associated with a hypertext linkmay be decreased in response to a particular web page not beingretrieved for a predetermined period of time.

Other changes, substitutions and alterations are also possible withoutdeparting from the spirit and scope of the present invention, as definedby the following claims.

What is claimed is:
 1. A method for prefetching web pages, comprising:retrieving a first web page at a cache server in response to a requestfor the first web page; assigning a transaction weight to each of one ormore second web pages linked to the retrieved first web page, thetransaction weight indicating a relative importance of a particular webpage by a source of the particular web page compared to other web pagesat the source; determining whether to prefetch the one or more secondweb pages linked to the retrieved first web page, wherein a second webpage is prefetched when its transaction weight exceeds a prefetchthreshold; prefetching, prior to a request, those second web pageslinked to the retrieved first web page whose transaction weight exceedsthe prefetch threshold.
 2. The method of claim 1, wherein the source ofthe one or more second web pages is one or more origin servers, eachorigin server having a different prefetch threshold associatedtherewith.
 3. The method of claim 1, further comprising: dynamicallyadjusting the prefetch threshold based on a processing load experiencedat the cache server.
 4. The method of claim 1, further comprising:dynamically adjusting the prefetch threshold based on a currentbandwidth available to the cache server.
 5. The method of claim 1,further comprising: increasing the prefetch threshold in response toincreased traffic at the cache server.
 6. The method of claim 1, furthercomprising: dynamically adjusting the prefetch threshold based on aprocessing load experienced at the source.
 7. The method of Clam 1,further comprising: increasing a transaction weight for a particularpage to be prefetched in response to the particular page beingpreviously retrieved a set amount of times.
 8. The method of claim 1,further comprising: decreasing a transaction weight for a particularpage to be prefetched in response to the particular page not beingretrieved over a defined time period.
 9. The method of claim 1, furthercomprising: determining whether to update the transaction weightassociated with a particular page in response to retrieval of theparticular page.
 10. A non-transitory computer readable storage mediumincluding code for prefetching a web page, the code operable whenexecuted to: retrieve a first web page for a cache server in response toa request for the first web page; assign a transaction weight to each ofone or more second web pages linked to the retrieved first web page, thetransaction weight indicating a relative importance of a particular webpage by a source of the particular web page compared to other web pagesat the source; determine whether to prefetch the one or more second webpages linked to the retrieved first web page, wherein a second web pageis prefetched when its transaction weight exceeds a prefetch threshold;prefetch, prior to a request, those second web pages linked to theretrieved first web page whose transaction weight exceeds the prefetchthreshold.
 11. The non-transitory computer readable storage medium ofclaim 10, wherein the code is further operable to: dynamically adjustthe prefetch threshold based on any of a processing load experienced atthe cache server and a current bandwidth available to the cache server.12. The non-transitory computer readable storage medium of claim 10,wherein the code is further operable to: dynamically adjust the prefetchthreshold based on a processing load experienced at the source.
 13. Thenon-transitory computer readable storage medium of claim 10, wherein thecode is further operable to: increase the prefetch threshold in responseto increased traffic at the cache server.
 14. The non-transitorycomputer readable storage medium of claim 10, wherein the code isfurther operable to: increase or decrease a transaction weight for aparticular page to be prefetched in response to any of the particularpage being previously retrieved a set amount of times or not retrievedover a defined time period.
 15. The non-transitory computer readablestorage medium of claim 10, wherein the code is further operable to:determine whether to update the transaction weight associated with aparticular page in response to retrieval of the particular page.
 16. Asystem for prefetching a web page, comprising: means for retrieving afirst web page for a cache server in response to a request for the firstweb page; means for assigning a transaction weight to each of one ormore second web pages linked to the retrieved first web page, thetransaction weight indicating a relative importance of a particular webpage by a source of the particular web page compared to other web pagesat the source; means for determining whether to prefetch the one or moresecond web pages linked to the retrieved first web page, wherein asecond web page is prefetched when its transaction weight exceeds aprefetch threshold; means for prefetching, prior to a request, thosesecond web pages linked to the retrieved first web page whosetransaction weight exceeds the prefetch threshold.
 17. The system ofclaim 16, further comprising: means for dynamically adjusting theprefetch threshold based on any of a processing load experienced at thecache server, a current bandwidth available to the cache server, and atraffic level at the cache server.
 18. The system of claim 16, furthercomprising: means for increasing or decreasing a transaction weight fora particular page to be prefetched in response to any of the particularpage being previously retrieved a set amount of times or not beingretrieved over a defined time period.
 19. The system of claim 16,further comprising: means for dynamically adjusting the prefetchthreshold based on a processing load experienced at the source.
 20. Thesystem of claim 16, further comprising: means for determining whether toupdate the transaction weight associated with a particular page inresponse to retrieval of the particular page.